Skip to content

Conversation

@codeflash-ai
Copy link

@codeflash-ai codeflash-ai bot commented Dec 2, 2025

📄 34% (0.34x) speedup for Tooltips._pseudo_css in pandas/io/formats/style_render.py

⏱️ Runtime : 184 microseconds 137 microseconds (best of 250 runs)

📝 Explanation and details

The optimized code achieves a 33% speedup by eliminating redundant string concatenation operations in the _pseudo_css method.

Key optimization:

  • Before: The selector ID was built using multiple string concatenations: "#T_" + uuid + "_row" + str(row) + "_col" + str(col), then concatenated again twice for the two CSS selectors
  • After: The base selector is constructed once using an f-string: f"#T_{uuid}_row{row}_col{col}" and stored in base_selector, then reused via f-string interpolation

Why this is faster:

  1. Reduced string operations: Instead of 7 total concatenations (3 for base + 2 more for each selector), we now have 3 f-string operations total
  2. Fewer temporary objects: The original code created multiple intermediate string objects during concatenation, while f-strings are more efficient at building strings in one operation
  3. Better memory usage: Reusing the base selector reduces object creation and garbage collection overhead

Performance characteristics from tests:

  • Consistent 25-43% improvements across all test cases
  • Particularly effective for scenarios with many calls (large-scale tests showing 34-37% speedups)
  • Benefits scale with usage frequency, making this optimization valuable for DataFrame styling operations that generate many tooltips

Impact on workloads: This optimization is especially beneficial when styling large DataFrames with many cells requiring tooltips, as the _pseudo_css method would be called once per tooltip-enabled cell. The cumulative effect of reducing string operations per call provides substantial performance gains in tooltip-heavy styling scenarios.

Correctness verification report:

Test Status
⚙️ Existing Unit Tests 🔘 None Found
🌀 Generated Regression Tests 240 Passed
⏪ Replay Tests 🔘 None Found
🔎 Concolic Coverage Tests 🔘 None Found
📊 Tests Coverage 100.0%
🌀 Generated Regression Tests and Runtime
# imports
from pandas.io.formats.style_render import Tooltips

# unit tests

# 1. Basic Test Cases


def test_basic_typical_values():
    """Test with typical, expected values."""
    t = Tooltips()
    uuid = "123abc"
    name = "pd-t"
    row = 2
    col = 3
    text = "Tooltip text"
    codeflash_output = t._pseudo_css(uuid, name, row, col, text)
    result = codeflash_output  # 1.97μs -> 1.37μs (43.5% faster)


def test_basic_different_class_name():
    """Test with a different CSS class name."""
    t = Tooltips()
    uuid = "xyz"
    name = "custom-class"
    row = 0
    col = 0
    text = "Hello"
    codeflash_output = t._pseudo_css(uuid, name, row, col, text)
    result = codeflash_output  # 1.83μs -> 1.35μs (35.7% faster)


def test_basic_numeric_text():
    """Test with numeric text (should be converted to string in content)."""
    t = Tooltips()
    uuid = "num"
    name = "pd-t"
    row = 1
    col = 1
    text = 12345
    codeflash_output = t._pseudo_css(uuid, name, row, col, str(text))
    result = codeflash_output  # 1.90μs -> 1.45μs (31.0% faster)


def test_basic_empty_string_text():
    """Test with empty string as tooltip text."""
    t = Tooltips()
    uuid = "empty"
    name = "pd-t"
    row = 4
    col = 5
    text = ""
    codeflash_output = t._pseudo_css(uuid, name, row, col, text)
    result = codeflash_output  # 1.90μs -> 1.43μs (33.1% faster)


def test_basic_special_characters():
    """Test with special characters in tooltip text."""
    t = Tooltips()
    uuid = "special"
    name = "pd-t"
    row = 3
    col = 2
    text = 'Some "quoted" text & <html>'
    codeflash_output = t._pseudo_css(uuid, name, row, col, text)
    result = codeflash_output  # 2.01μs -> 1.49μs (34.9% faster)


# 2. Edge Test Cases


def test_edge_negative_indices():
    """Test with negative row and col indices."""
    t = Tooltips()
    uuid = "neg"
    name = "pd-t"
    row = -1
    col = -5
    text = "Negative"
    codeflash_output = t._pseudo_css(uuid, name, row, col, text)
    result = codeflash_output  # 2.06μs -> 1.50μs (36.9% faster)


def test_edge_zero_indices():
    """Test with zero row and col indices."""
    t = Tooltips()
    uuid = "zero"
    name = "pd-t"
    row = 0
    col = 0
    text = "Zero"
    codeflash_output = t._pseudo_css(uuid, name, row, col, text)
    result = codeflash_output  # 1.98μs -> 1.50μs (32.3% faster)


def test_edge_empty_uuid_and_name():
    """Test with empty uuid and class name."""
    t = Tooltips()
    uuid = ""
    name = ""
    row = 1
    col = 1
    text = "Test"
    codeflash_output = t._pseudo_css(uuid, name, row, col, text)
    result = codeflash_output  # 2.00μs -> 1.56μs (27.7% faster)


def test_edge_long_text():
    """Test with a long tooltip text."""
    t = Tooltips()
    uuid = "long"
    name = "pd-t"
    row = 1
    col = 1
    text = "A" * 500
    codeflash_output = t._pseudo_css(uuid, name, row, col, text)
    result = codeflash_output  # 2.10μs -> 1.68μs (25.4% faster)


def test_edge_unicode_text():
    """Test with unicode characters in tooltip text."""
    t = Tooltips()
    uuid = "uni"
    name = "pd-t"
    row = 1
    col = 1
    text = "你好, мир, hello"
    codeflash_output = t._pseudo_css(uuid, name, row, col, text)
    result = codeflash_output  # 2.31μs -> 1.78μs (30.0% faster)


def test_edge_text_with_newlines_and_tabs():
    """Test with newlines and tabs in tooltip text."""
    t = Tooltips()
    uuid = "nl"
    name = "pd-t"
    row = 1
    col = 1
    text = "Line1\nLine2\tTabbed"
    codeflash_output = t._pseudo_css(uuid, name, row, col, text)
    result = codeflash_output  # 1.97μs -> 1.46μs (34.7% faster)


def test_edge_text_with_escaped_quotes():
    """Test with text that includes escaped quotes."""
    t = Tooltips()
    uuid = "esc"
    name = "pd-t"
    row = 1
    col = 1
    text = 'He said: \\"hello\\"'
    codeflash_output = t._pseudo_css(uuid, name, row, col, text)
    result = codeflash_output  # 2.01μs -> 1.49μs (34.4% faster)


def test_edge_non_string_uuid_and_name():
    """Test with non-string uuid and name (should be converted to str)."""
    t = Tooltips()
    uuid = 123
    name = 456
    row = 7
    col = 8
    text = "Non-string"
    # The function expects string, so we convert to str
    codeflash_output = t._pseudo_css(str(uuid), str(name), row, col, text)
    result = codeflash_output  # 2.00μs -> 1.64μs (22.0% faster)


def test_edge_large_indices():
    """Test with large row and col indices."""
    t = Tooltips()
    uuid = "large"
    name = "pd-t"
    row = 999
    col = 888
    text = "Large Indices"
    codeflash_output = t._pseudo_css(uuid, name, row, col, text)
    result = codeflash_output  # 2.00μs -> 1.58μs (26.9% faster)


def test_edge_text_is_none():
    """Test with text as None (should become 'None' as string)."""
    t = Tooltips()
    uuid = "none"
    name = "pd-t"
    row = 1
    col = 1
    text = None
    codeflash_output = t._pseudo_css(uuid, name, row, col, str(text))
    result = codeflash_output  # 2.04μs -> 1.49μs (37.0% faster)


# 3. Large Scale Test Cases


def test_large_scale_many_calls():
    """Test performance and correctness with many calls and unique values."""
    t = Tooltips()
    uuids = [f"uuid{i}" for i in range(50)]
    names = [f"class{i}" for i in range(50)]
    for i in range(50):
        uuid = uuids[i]
        name = names[i]
        row = i
        col = 49 - i
        text = f"Tooltip {i}"
        codeflash_output = t._pseudo_css(uuid, name, row, col, text)
        result = codeflash_output  # 35.5μs -> 26.5μs (33.8% faster)


def test_large_scale_long_uuids_and_names():
    """Test with very long uuid and class name strings."""
    t = Tooltips()
    uuid = "u" * 200
    name = "c" * 200
    row = 0
    col = 0
    text = "Long names"
    codeflash_output = t._pseudo_css(uuid, name, row, col, text)
    result = codeflash_output  # 1.89μs -> 1.39μs (36.0% faster)


def test_large_scale_maximum_length_text():
    """Test with maximum allowed length for text (e.g., 1000 chars)."""
    t = Tooltips()
    uuid = "max"
    name = "pd-t"
    row = 1
    col = 1
    text = "X" * 1000
    codeflash_output = t._pseudo_css(uuid, name, row, col, text)
    result = codeflash_output  # 2.14μs -> 1.67μs (28.2% faster)


def test_large_scale_varied_inputs():
    """Test with a variety of inputs in a loop."""
    t = Tooltips()
    for i in range(10):
        uuid = f"uuid{i}"
        name = f"name{i}"
        row = i * 3
        col = i * 7
        text = f"Text {i} special !@#{i}"
        codeflash_output = t._pseudo_css(uuid, name, row, col, text)
        result = codeflash_output  # 8.54μs -> 6.40μs (33.6% faster)


def test_large_scale_all_ascii_chars():
    """Test with tooltip text containing all printable ASCII characters."""
    t = Tooltips()
    uuid = "ascii"
    name = "pd-t"
    row = 1
    col = 1
    text = "".join(chr(i) for i in range(32, 127))  # All printable ASCII
    codeflash_output = t._pseudo_css(uuid, name, row, col, text)
    result = codeflash_output  # 1.67μs -> 1.28μs (30.5% faster)


# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.
import random  # for generating random test data
import string  # for generating long/edge test strings
# function to test

# imports
from pandas.io.formats.style_render import Tooltips

# unit tests

# --------------------------
# Basic Test Cases
# --------------------------


def test_basic_standard_input():
    """Standard input with typical values."""
    t = Tooltips()
    uuid = "123abc"
    name = "pd-t"
    row = 2
    col = 3
    text = "Tooltip text"
    expected = [
        {
            "selector": "#T_123abc_row2_col3:hover .pd-t",
            "props": [("visibility", "visible")],
        },
        {
            "selector": "#T_123abc_row2_col3 .pd-t::after",
            "props": [("content", '"Tooltip text"')],
        },
    ]
    codeflash_output = t._pseudo_css(uuid, name, row, col, text)
    result = codeflash_output  # 2.06μs -> 1.48μs (39.7% faster)


def test_basic_different_class_name():
    """Custom css class name."""
    t = Tooltips()
    uuid = "xyz"
    name = "custom-class"
    row = 0
    col = 0
    text = "Hello"
    expected = [
        {
            "selector": "#T_xyz_row0_col0:hover .custom-class",
            "props": [("visibility", "visible")],
        },
        {
            "selector": "#T_xyz_row0_col0 .custom-class::after",
            "props": [("content", '"Hello"')],
        },
    ]
    codeflash_output = t._pseudo_css(uuid, name, row, col, text)
    result = codeflash_output  # 2.00μs -> 1.47μs (35.9% faster)


def test_basic_numeric_text():
    """Text is a numeric string."""
    t = Tooltips()
    uuid = "u"
    name = "n"
    row = 1
    col = 1
    text = "12345"
    expected = [
        {
            "selector": "#T_u_row1_col1:hover .n",
            "props": [("visibility", "visible")],
        },
        {
            "selector": "#T_u_row1_col1 .n::after",
            "props": [("content", '"12345"')],
        },
    ]
    codeflash_output = t._pseudo_css(uuid, name, row, col, text)
    result = codeflash_output  # 1.88μs -> 1.40μs (34.5% faster)


def test_basic_empty_text():
    """Text is an empty string."""
    t = Tooltips()
    uuid = "a"
    name = "b"
    row = 4
    col = 5
    text = ""
    expected = [
        {
            "selector": "#T_a_row4_col5:hover .b",
            "props": [("visibility", "visible")],
        },
        {
            "selector": "#T_a_row4_col5 .b::after",
            "props": [("content", '""')],
        },
    ]
    codeflash_output = t._pseudo_css(uuid, name, row, col, text)
    result = codeflash_output  # 1.87μs -> 1.38μs (35.7% faster)


def test_basic_special_characters_in_text():
    """Text contains special characters and quotes."""
    t = Tooltips()
    uuid = "u1"
    name = "cls"
    row = 7
    col = 8
    text = "Special chars: !@#$%^&*()_+-=[]{};':\",.<>/?\\|`~"
    expected = [
        {
            "selector": "#T_u1_row7_col8:hover .cls",
            "props": [("visibility", "visible")],
        },
        {
            "selector": "#T_u1_row7_col8 .cls::after",
            "props": [("content", f'"{text}"')],
        },
    ]
    codeflash_output = t._pseudo_css(uuid, name, row, col, text)
    result = codeflash_output  # 1.80μs -> 1.41μs (27.9% faster)


# --------------------------
# Edge Test Cases
# --------------------------


def test_edge_zero_indices():
    """Row and col indices are zero."""
    t = Tooltips()
    uuid = "zero"
    name = "z"
    row = 0
    col = 0
    text = "Zero indices"
    expected = [
        {
            "selector": "#T_zero_row0_col0:hover .z",
            "props": [("visibility", "visible")],
        },
        {
            "selector": "#T_zero_row0_col0 .z::after",
            "props": [("content", '"Zero indices"')],
        },
    ]
    codeflash_output = t._pseudo_css(uuid, name, row, col, text)
    result = codeflash_output  # 1.90μs -> 1.42μs (33.9% faster)


def test_edge_negative_indices():
    """Row and col indices are negative."""
    t = Tooltips()
    uuid = "neg"
    name = "negclass"
    row = -1
    col = -2
    text = "Negative indices"
    expected = [
        {
            "selector": "#T_neg_row-1_col-2:hover .negclass",
            "props": [("visibility", "visible")],
        },
        {
            "selector": "#T_neg_row-1_col-2 .negclass::after",
            "props": [("content", '"Negative indices"')],
        },
    ]
    codeflash_output = t._pseudo_css(uuid, name, row, col, text)
    result = codeflash_output  # 1.98μs -> 1.45μs (36.5% faster)


def test_edge_large_indices():
    """Row and col indices are very large numbers."""
    t = Tooltips()
    uuid = "big"
    name = "bigclass"
    row = 999999
    col = 888888
    text = "Large indices"
    expected = [
        {
            "selector": "#T_big_row999999_col888888:hover .bigclass",
            "props": [("visibility", "visible")],
        },
        {
            "selector": "#T_big_row999999_col888888 .bigclass::after",
            "props": [("content", '"Large indices"')],
        },
    ]
    codeflash_output = t._pseudo_css(uuid, name, row, col, text)
    result = codeflash_output  # 2.08μs -> 1.54μs (35.1% faster)


def test_edge_unicode_in_text():
    """Text contains unicode characters."""
    t = Tooltips()
    uuid = "uni"
    name = "unicode"
    row = 5
    col = 6
    text = "Unicode: 測試, тест, اختبار, 😀"
    expected = [
        {
            "selector": "#T_uni_row5_col6:hover .unicode",
            "props": [("visibility", "visible")],
        },
        {
            "selector": "#T_uni_row5_col6 .unicode::after",
            "props": [("content", f'"{text}"')],
        },
    ]
    codeflash_output = t._pseudo_css(uuid, name, row, col, text)
    result = codeflash_output  # 1.98μs -> 1.50μs (32.2% faster)


def test_edge_empty_uuid_and_class():
    """UUID and class name are empty strings."""
    t = Tooltips()
    uuid = ""
    name = ""
    row = 1
    col = 2
    text = "Empty uuid and class"
    expected = [
        {
            "selector": "#T__row1_col2:hover .",
            "props": [("visibility", "visible")],
        },
        {
            "selector": "#T__row1_col2 .::after",
            "props": [("content", '"Empty uuid and class"')],
        },
    ]
    codeflash_output = t._pseudo_css(uuid, name, row, col, text)
    result = codeflash_output  # 1.89μs -> 1.43μs (32.5% faster)


def test_edge_text_with_newlines_and_tabs():
    """Text contains newlines and tabs."""
    t = Tooltips()
    uuid = "nl"
    name = "tab"
    row = 3
    col = 4
    text = "Line1\nLine2\tTabbed"
    expected = [
        {
            "selector": "#T_nl_row3_col4:hover .tab",
            "props": [("visibility", "visible")],
        },
        {
            "selector": "#T_nl_row3_col4 .tab::after",
            "props": [("content", f'"{text}"')],
        },
    ]
    codeflash_output = t._pseudo_css(uuid, name, row, col, text)
    result = codeflash_output  # 1.94μs -> 1.47μs (31.9% faster)


def test_edge_text_with_double_and_single_quotes():
    """Text contains both double and single quotes."""
    t = Tooltips()
    uuid = "q"
    name = "quotes"
    row = 9
    col = 10
    text = "She said: \"Hello\", then 'Bye'"
    expected = [
        {
            "selector": "#T_q_row9_col10:hover .quotes",
            "props": [("visibility", "visible")],
        },
        {
            "selector": "#T_q_row9_col10 .quotes::after",
            "props": [("content", f'"{text}"')],
        },
    ]
    codeflash_output = t._pseudo_css(uuid, name, row, col, text)
    result = codeflash_output  # 1.91μs -> 1.48μs (28.9% faster)


# --------------------------
# Large Scale Test Cases
# --------------------------


def test_large_scale_many_invocations():
    """Test function correctness and performance with many different calls."""
    t = Tooltips()
    uuid = "large"
    name = "scale"
    for i in range(100):  # keep under 1000 as per instructions
        row = i
        col = 99 - i
        text = f"Tooltip {i}"
        expected = [
            {
                "selector": f"#T_large_row{row}_col{col}:hover .scale",
                "props": [("visibility", "visible")],
            },
            {
                "selector": f"#T_large_row{row}_col{col} .scale::after",
                "props": [("content", f'"Tooltip {i}"')],
            },
        ]
        codeflash_output = t._pseudo_css(uuid, name, row, col, text)
        result = codeflash_output  # 66.5μs -> 49.5μs (34.3% faster)


def test_large_scale_long_text():
    """Test with a very long tooltip text."""
    t = Tooltips()
    uuid = "long"
    name = "longtext"
    row = 1
    col = 1
    long_text = "".join(random.choices(string.ascii_letters + string.digits, k=500))
    expected = [
        {
            "selector": "#T_long_row1_col1:hover .longtext",
            "props": [("visibility", "visible")],
        },
        {
            "selector": "#T_long_row1_col1 .longtext::after",
            "props": [("content", f'"{long_text}"')],
        },
    ]
    codeflash_output = t._pseudo_css(uuid, name, row, col, long_text)
    result = codeflash_output  # 1.89μs -> 1.43μs (32.6% faster)


def test_large_scale_long_uuid_and_class():
    """Test with long uuid and class names."""
    t = Tooltips()
    uuid = "".join(random.choices(string.ascii_lowercase, k=100))
    name = "".join(random.choices(string.ascii_letters, k=50))
    row = 42
    col = 24
    text = "Long uuid and class"
    expected = [
        {
            "selector": f"#T_{uuid}_row42_col24:hover .{name}",
            "props": [("visibility", "visible")],
        },
        {
            "selector": f"#T_{uuid}_row42_col24 .{name}::after",
            "props": [("content", '"Long uuid and class"')],
        },
    ]
    codeflash_output = t._pseudo_css(uuid, name, row, col, text)
    result = codeflash_output  # 1.82μs -> 1.32μs (37.5% faster)


def test_large_scale_all_ascii_printable_in_text():
    """Test with all printable ASCII characters in the text."""
    t = Tooltips()
    uuid = "ascii"
    name = "printable"
    row = 0
    col = 0
    text = "".join(
        [c for c in string.printable if c not in "\r"]
    )  # avoid carriage return
    expected = [
        {
            "selector": "#T_ascii_row0_col0:hover .printable",
            "props": [("visibility", "visible")],
        },
        {
            "selector": "#T_ascii_row0_col0 .printable::after",
            "props": [("content", f'"{text}"')],
        },
    ]
    codeflash_output = t._pseudo_css(uuid, name, row, col, text)
    result = codeflash_output  # 1.83μs -> 1.28μs (42.1% faster)


def test_large_scale_randomized_inputs():
    """Test with randomized inputs for uuid, name, row, col, and text."""
    t = Tooltips()
    for _ in range(10):  # 10 random samples
        uuid = "".join(random.choices(string.ascii_letters + string.digits, k=8))
        name = "".join(random.choices(string.ascii_letters, k=6))
        row = random.randint(-100, 100)
        col = random.randint(-100, 100)
        text = "".join(random.choices(string.ascii_letters + string.digits + " ", k=20))
        expected = [
            {
                "selector": f"#T_{uuid}_row{row}_col{col}:hover .{name}",
                "props": [("visibility", "visible")],
            },
            {
                "selector": f"#T_{uuid}_row{row}_col{col} .{name}::after",
                "props": [("content", f'"{text}"')],
            },
        ]
        codeflash_output = t._pseudo_css(uuid, name, row, col, text)
        result = codeflash_output  # 8.77μs -> 6.42μs (36.6% faster)


# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.

To edit these changes git checkout codeflash/optimize-Tooltips._pseudo_css-mio7oay7 and push.

Codeflash Static Badge

The optimized code achieves a **33% speedup** by eliminating redundant string concatenation operations in the `_pseudo_css` method.

**Key optimization:**
- **Before**: The selector ID was built using multiple string concatenations: `"#T_" + uuid + "_row" + str(row) + "_col" + str(col)`, then concatenated again twice for the two CSS selectors
- **After**: The base selector is constructed once using an f-string: `f"#T_{uuid}_row{row}_col{col}"` and stored in `base_selector`, then reused via f-string interpolation

**Why this is faster:**
1. **Reduced string operations**: Instead of 7 total concatenations (3 for base + 2 more for each selector), we now have 3 f-string operations total
2. **Fewer temporary objects**: The original code created multiple intermediate string objects during concatenation, while f-strings are more efficient at building strings in one operation
3. **Better memory usage**: Reusing the base selector reduces object creation and garbage collection overhead

**Performance characteristics from tests:**
- Consistent **25-43% improvements** across all test cases
- Particularly effective for scenarios with many calls (large-scale tests showing **34-37% speedups**)
- Benefits scale with usage frequency, making this optimization valuable for DataFrame styling operations that generate many tooltips

**Impact on workloads:** This optimization is especially beneficial when styling large DataFrames with many cells requiring tooltips, as the `_pseudo_css` method would be called once per tooltip-enabled cell. The cumulative effect of reducing string operations per call provides substantial performance gains in tooltip-heavy styling scenarios.
@codeflash-ai codeflash-ai bot requested a review from mashraf-222 December 2, 2025 06:43
@codeflash-ai codeflash-ai bot added ⚡️ codeflash Optimization PR opened by Codeflash AI 🎯 Quality: High Optimization Quality according to Codeflash labels Dec 2, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

⚡️ codeflash Optimization PR opened by Codeflash AI 🎯 Quality: High Optimization Quality according to Codeflash

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant